A new approach for modeling OOV words

نویسندگان

  • Weimin Ren
  • Chengfa Wang
  • Wen Gao
  • Jinpei Xu
چکیده

This paper addressed the problem of Out-Of-Vocabulary (OOV) utterance detection in small vocabulary telephone keyword spotting system. We propose a new approach for modeling OOV words in the scenario of a small vocabulary of telephone keyword spotting system. The paper adopt the semi-continuous Hidden Markov Model with multiple codebooks to modeling the keywords. We propose a two pass procedure to spot the real keyword occurrence. In the first pass, the normal viterbi search procedure is applied, with the appropriate defined and trained garbage models and silence models. The output of this stage produces the N-best word hypothesis The second pass, which can be seen as a verification procedure, take the first pass output as focuses. This approach is mainly constructing a “dynamic anti-model” based on the detected hypothesis keyword model and the current input acoustic information.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Out-of-vocabulary Words for Robust Speech Recognition1

In this paper we present an approach for modeling and recognizing out-of-vocabulary (OOV) words in a single stage recognizer. A word-based recognizer is augmented with an extra OOV word model, which enables the OOV word to be predicted by a wordbased language model. The OOV model itself is phone-based, so that an OOV word can be realized as an arbitrary sequence of phones. A phone bigram is use...

متن کامل

Recognition of out-of-vocabulary words with sub-lexical language models

A major source of recognition errors, out-of-vocabulary (OOV) words are also semantically important; recognizing them is, therefore, crucial for understanding. Success, so far, has been modest, even on very constrained tasks. In this paper we present a new approach to unlimited vocabulary speech recognition based on using graphemeto-phoneme correspondences for sub-lexical modeling of OOV words,...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Modelling Out-of-Vocabulary Words for Robust Speech Recognition

This thesis concerns the problem of unknown or out-of-vocabulary (OOV) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognizer erroneously substitutes the OOV word with a similarly sounding word from its vocabulary. Furthe...

متن کامل

Lexical out-of-vocabulary models for one-stage speech interpretation

We present an approach to explicit, statistical, lexical-level out-of-vocabulary (OOV) word modeling for direct integration into the search space of a one-stage speech interpretation system. For this purpose, a generic pronunciation model for unknown words is derived from large pronunciation lexica and, optionally, word frequency knowledge. Known statistical language modeling (LM) methods are u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000